11 research outputs found

    Heavy-tailed Independent Component Analysis

    Full text link
    Independent component analysis (ICA) is the problem of efficiently recovering a matrix A∈Rn×nA \in \mathbb{R}^{n\times n} from i.i.d. observations of X=ASX=AS where S∈RnS \in \mathbb{R}^n is a random vector with mutually independent coordinates. This problem has been intensively studied, but all existing efficient algorithms with provable guarantees require that the coordinates SiS_i have finite fourth moments. We consider the heavy-tailed ICA problem where we do not make this assumption, about the second moment. This problem also has received considerable attention in the applied literature. In the present work, we first give a provably efficient algorithm that works under the assumption that for constant γ>0\gamma > 0, each SiS_i has finite (1+γ)(1+\gamma)-moment, thus substantially weakening the moment requirement condition for the ICA problem to be solvable. We then give an algorithm that works under the assumption that matrix AA has orthogonal columns but requires no moment assumptions. Our techniques draw ideas from convex geometry and exploit standard properties of the multivariate spherical Gaussian distribution in a novel way.Comment: 30 page

    Non-Euclidean Differentially Private Stochastic Convex Optimization

    Full text link
    Differentially private (DP) stochastic convex optimization (SCO) is a fundamental problem, where the goal is to approximately minimize the population risk with respect to a convex loss function, given a dataset of i.i.d. samples from a distribution, while satisfying differential privacy with respect to the dataset. Most of the existing works in the literature of private convex optimization focus on the Euclidean (i.e., ℓ2\ell_2) setting, where the loss is assumed to be Lipschitz (and possibly smooth) w.r.t. the ℓ2\ell_2 norm over a constraint set with bounded ℓ2\ell_2 diameter. Algorithms based on noisy stochastic gradient descent (SGD) are known to attain the optimal excess risk in this setting. In this work, we conduct a systematic study of DP-SCO for ℓp\ell_p-setups. For p=1p=1, under a standard smoothness assumption, we give a new algorithm with nearly optimal excess risk. This result also extends to general polyhedral norms and feasible sets. For p∈(1,2)p\in(1, 2), we give two new algorithms, whose central building block is a novel privacy mechanism, which generalizes the Gaussian mechanism. Moreover, we establish a lower bound on the excess risk for this range of pp, showing a necessary dependence on d\sqrt{d}, where dd is the dimension of the space. Our lower bound implies a sudden transition of the excess risk at p=1p=1, where the dependence on dd changes from logarithmic to polynomial, resolving an open question in prior work [TTZ15] . For p∈(2,∞)p\in (2, \infty), noisy SGD attains optimal excess risk in the low-dimensional regime; in particular, this proves the optimality of noisy SGD for p=∞p=\infty. Our work draws upon concepts from the geometry of normed spaces, such as the notions of regularity, uniform convexity, and uniform smoothness

    A new over-dispersed count model

    Full text link
    A new two-parameter discrete distribution, namely the PoiG distribution is derived by the convolution of a Poisson variate and an independently distributed geometric random variable. This distribution generalizes both the Poisson and geometric distributions and can be used for modelling over-dispersed as well as equi-dispersed count data. A number of important statistical properties of the proposed count model, such as the probability generating function, the moment generating function, the moments, the survival function and the hazard rate function. Monotonic properties are studied, such as the log concavity and the stochastic ordering are also investigated in detail. Method of moment and the maximum likelihood estimators of the parameters of the proposed model are presented. It is envisaged that the proposed distribution may prove to be useful for the practitioners for modelling over-dispersed count data compared to its closest competitors

    A genome-wide association study identifies risk alleles in plasminogen and P4HA2 associated with giant cell arteritis

    Get PDF
    Giant cell arteritis (GCA) is the most common form of vasculitis in individuals older than 50 years in Western countries. To shed light onto the genetic background influencing susceptibility for GCA, we performed a genome-wide association screening in a well-powered study cohort. After imputation, 1,844,133 genetic variants were analysed in 2,134 cases and 9,125 unaffected controls from ten independent populations of European ancestry. Our data confirmed HLA class II as the strongest associated region (independent signals: rs9268905, P = 1.94E-54, per-allele OR = 1.79; and rs9275592, P = 1.14E-40, OR = 2.08). Additionally, PLG and P4HA2 were identified as GCA risk genes at the genome-wide level of significance (rs4252134, P = 1.23E-10, OR = 1.28; and rs128738, P = 4.60E-09, OR = 1.32, respectively). Interestingly, we observed that the association peaks overlapped with different regulatory elements related to cell types and tissues involved in the pathophysiology of GCA. PLG and P4HA2 are involved in vascular remodelling and angiogenesis, suggesting a high relevance of these processes for the pathogenic mechanisms underlying this type of vasculitis

    Heavy-Tailed Analogues of the Covariance Matrix for ICA

    No full text
    Independent Component Analysis (ICA) is the problem of learning a square matrix A, given samples of X = AS, where S is a random vector with independent coordinates. Most existing algorithms are provably efficient only when each Si has finite and moderately valued fourth moment. However, there are practical applications where this assumption need not be true, such as speech and finance. Algorithms have been proposed for heavy-tailed ICA, but they are not practical, using random walks and the full power of the ellipsoid algorithm multiple times. The main contributions of this paper are (1) A practical algorithm for heavy-tailed ICA that we call HTICA. We provide theoretical guarantees and show that it outperforms other algorithms in some heavy-tailed regimes, both on real and synthetic data. Like the current state-of-the-art, the new algorithm is based on the centroid body (a first moment analogue of the covariance matrix). Unlike the state-of-the-art, our algorithm is practically efficient. To achieve this, we use explicit analytic representations of the centroid body, which bypasses the use of the ellipsoid method and random walks. (2) We study how heavy tails affect different ICA algorithms, including HTICA. Somewhat surprisingly, we show that some algorithms that use the covariance matrix or higher moments can successfully solve a range of ICA instances with infinite second moment. We study this theoretically and experimentally, with both synthetic and real-world heavy-tailed data

    Bioengineered cellular and cell membrane-derived vehicles for actively targeted drug delivery: So near and yet so far

    No full text
    corecore